Univariate Stock Predictions: LSTM, ARIMA, and prophet

INFO 523 - Final Project

Project description
Author
Affiliation

Matt Osterhoudt

College of Information Science, University of Arizona

Abstract

Time series models can be used to predict track stock data using historical closing values. Importing stock data from Yahoo Finance, we predict MSFT, SPY, and SWPPX by comparing three models: ARIMA (AutoRegressive Integrated Moving Average), LSTM (Long Short-Term Memory Neural Network), and Facebook Prophet. Each model has its strengths, but the LSTM model proved to be the most effective with a high R-squared value across the three stocks (average of .987). The LSTM model also boasted significantly lower Mean Squared Error and Mean Absolute Error when compared to ARIMA and Prophet. This performance conveys that LSTM may prove more effective for moderately long stock predictions and signifies the power of recurring neural networks.

Introduction/Question

Driving question: Which time series model (ARIMA, LSTM, or Prophet) is best for univariate (daily closing price) stock price prediction? This project aims to identify the most effective model using three different stock datasets. The stock data used will be MSFT (Microsoft), SPY (SPDR S&P 500 ETF Trust), and SWPPX (Schwab U.S. Large-Cap ETF). The data will range from the beginning of 2015 to the end of 2024. The only relevant variables used from the data will be the closing price and the date index. “Close” is the closing price of the stock per day. The date index will also be referred to in yyyy-mm-day format. By developing these models, our objective is to clarify which model can be utilized or recognized when it comes to predicting stock data.

Approach

First, I extracted the stock data using the finance package, an API that retrieves stock data from Yahoo Finance. I extracted daily historical data from 2015 to 2024 and retained the “Close” price series as well as the date index for univariate analysis. I did not deem it necessary to preprocess much of the data. This is because there were no outliers I wanted to remove, nor was there missing data outside of holidays and weekends (this is expected). Feature scaling and normalization were performed within certain time series models if necessary. I selected these time series models to deepen my understanding of time series analysis.

ARIMA Approach & Analysis

First, I chose ARIMA (AutoRegressive Integrated Moving Average) for its ability to capture autocorrelated data and patterns after differencing is applied. To help prepare the data, ARIMA requires stationarity tests, which are applied using the ADF (Augmented Dickey-Fuller) test to test raw vs differenced data. If the p-value is less than the significance level (0.05), we may reject the null, implying that the time series data is stationary. If higher than the significant value, we must find the order of differencing. ARIMA also requires ACF(autocorrelation) and PACF(partial autocorrelation) analysis for P and Q lag selections. ACF and PACF are plotted graphically to aid P and Q selection. Determining how ARIMA is configured with these parameters (p, d, q) helps properly fit the model. Each stock’s data was split into 80/20 for training and testing. Predicted values were graphically overlaid on the actual test data for visual inspection and confirmation. The ARIMA model did not perform as well as I expected. Despite the P, D, and Q selection tests, my model fell quite short. As seen with all three stocks, it plotted a very linear “prediction” that seems to have simply taken an average. This tells me that the model failed to have any meaningful predictions, or that seasonality played an unexpected role. Limitations or future implementations: address this by utilizing SARIMA or autoarima. I can also do a more thorough check of my model for anything extraneous.

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 504 entries, 2022-12-29 00:00:00-05:00 to 2024-12-31 00:00:00-05:00
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Close   504 non-null    float64
dtypes: float64(1)
memory usage: 7.9 KB
p-value pre-difference: 0.8514689056264022
p-value post-difference: 8.88111660840834e-28
                               SARIMAX Results                                
==============================================================================
Dep. Variable:                  Close   No. Observations:                 2012
Model:                 ARIMA(9, 1, 9)   Log Likelihood               -5023.263
Date:                Thu, 21 Aug 2025   AIC                          10086.526
Time:                        04:56:23   BIC                          10198.653
Sample:                             0   HQIC                         10127.685
                               - 2012                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.0986      0.050      1.958      0.050      -0.000       0.197
ar.L1          0.0443      0.073      0.609      0.543      -0.098       0.187
ar.L2         -0.1120      0.071     -1.566      0.117      -0.252       0.028
ar.L3          0.1277      0.068      1.876      0.061      -0.006       0.261
ar.L4         -0.0957      0.058     -1.660      0.097      -0.209       0.017
ar.L5         -0.0871      0.069     -1.265      0.206      -0.222       0.048
ar.L6         -0.1014      0.068     -1.486      0.137      -0.235       0.032
ar.L7          0.1083      0.066      1.639      0.101      -0.021       0.238
ar.L8          0.0820      0.058      1.424      0.154      -0.031       0.195
ar.L9          0.6370      0.062     10.217      0.000       0.515       0.759
ma.L1         -0.1404      0.076     -1.851      0.064      -0.289       0.008
ma.L2          0.1138      0.074      1.529      0.126      -0.032       0.260
ma.L3         -0.1709      0.070     -2.457      0.014      -0.307      -0.035
ma.L4          0.1071      0.062      1.715      0.086      -0.015       0.229
ma.L5          0.0928      0.072      1.286      0.198      -0.049       0.234
ma.L6          0.0224      0.071      0.314      0.754      -0.118       0.162
ma.L7         -0.0729      0.070     -1.046      0.296      -0.209       0.064
ma.L8         -0.1714      0.061     -2.828      0.005      -0.290      -0.053
ma.L9         -0.5071      0.068     -7.431      0.000      -0.641      -0.373
sigma2         8.6792      0.140     61.936      0.000       8.405       8.954
===================================================================================
Ljung-Box (L1) (Q):                   0.14   Jarque-Bera (JB):              4169.21
Prob(Q):                              0.71   Prob(JB):                         0.00
Heteroskedasticity (H):              44.67   Skew:                            -0.55
Prob(H) (two-sided):                  0.00   Kurtosis:                         9.97
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

p-value pre-difference: 0.7797886330097034
p-value post-difference: 3.7231408505223645e-26
                               SARIMAX Results                                
==============================================================================
Dep. Variable:                  Close   No. Observations:                 2012
Model:                 ARIMA(9, 1, 6)   Log Likelihood               -5249.113
Date:                Thu, 21 Aug 2025   AIC                          10532.226
Time:                        04:56:32   BIC                          10627.534
Sample:                             0   HQIC                         10567.212
                               - 2012                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.0953      0.076      1.251      0.211      -0.054       0.245
ar.L1         -0.9847      0.181     -5.428      0.000      -1.340      -0.629
ar.L2         -0.1819      0.127     -1.428      0.153      -0.432       0.068
ar.L3          0.1249      0.094      1.330      0.184      -0.059       0.309
ar.L4         -0.5781      0.081     -7.163      0.000      -0.736      -0.420
ar.L5         -1.0178      0.158     -6.433      0.000      -1.328      -0.708
ar.L6         -0.6107      0.116     -5.286      0.000      -0.837      -0.384
ar.L7          0.0332      0.022      1.544      0.123      -0.009       0.075
ar.L8         -0.0174      0.027     -0.637      0.524      -0.071       0.036
ar.L9          0.0230      0.027      0.845      0.398      -0.030       0.076
ma.L1          0.9180      0.181      5.082      0.000       0.564       1.272
ma.L2          0.1389      0.119      1.164      0.245      -0.095       0.373
ma.L3         -0.1010      0.091     -1.107      0.268      -0.280       0.078
ma.L4          0.5618      0.076      7.356      0.000       0.412       0.711
ma.L5          0.9751      0.149      6.543      0.000       0.683       1.267
ma.L6          0.5229      0.103      5.071      0.000       0.321       0.725
sigma2        10.7461      0.181     59.397      0.000      10.391      11.101
===================================================================================
Ljung-Box (L1) (Q):                   0.17   Jarque-Bera (JB):              3687.37
Prob(Q):                              0.68   Prob(JB):                         0.00
Heteroskedasticity (H):               9.58   Skew:                            -0.76
Prob(H) (two-sided):                  0.00   Kurtosis:                         9.46
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

p-value pre-difference: 0.8497185460483221
p-value post-difference: 3.048198719951158e-26
                               SARIMAX Results                                
==============================================================================
Dep. Variable:                  Close   No. Observations:                 2012
Model:                 ARIMA(2, 1, 2)   Log Likelihood                2245.182
Date:                Thu, 21 Aug 2025   AIC                          -4478.363
Time:                        04:56:37   BIC                          -4444.725
Sample:                             0   HQIC                         -4466.015
                               - 2012                                         
Covariance Type:                  opg                                         
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.0035      0.002      2.051      0.040       0.000       0.007
ar.L1         -1.7581      0.020    -89.125      0.000      -1.797      -1.719
ar.L2         -0.8848      0.019    -47.657      0.000      -0.921      -0.848
ma.L1          1.6795      0.025     66.306      0.000       1.630       1.729
ma.L2          0.7877      0.024     32.695      0.000       0.740       0.835
sigma2         0.0063   8.18e-05     76.791      0.000       0.006       0.006
===================================================================================
Ljung-Box (L1) (Q):                   0.21   Jarque-Bera (JB):              9640.99
Prob(Q):                              0.64   Prob(JB):                         0.00
Heteroskedasticity (H):              16.25   Skew:                            -0.05
Prob(H) (two-sided):                  0.00   Kurtosis:                        13.73
===================================================================================

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).

LSTM Approach & Analysis

Next, I selected LSTM for its capability of modeling sequential data (in this case, time series stock). LSTM does not use linear modeling; instead, it learns patterns within its observed cell states. In this model, I normalized the data for efficiency and to prevent scaling issues. I used TensorFlow and devised a predictive pattern per 60 days. In simple terms, the LSTM model uses the previous 60 days to predict a single day’s stock price. This is iterated over the entire stock’s data. The sequential model is then constructed with 50 units of internal memory cells and 8 neurons using Rectified Linear Unit (ReLU), a neural network function. The model is trained on a batch size of 32 samples and is passed through the training set 20 times (epoch of 20). I am not (yet) particularly well-versed in neural network machine learning. Many of the variable number selections (20 epochs, 32 samples, 50 units of internal memory cells, etc.) are fundamental values I selected based on conventional practice. Because this LSTM model is sequenced several times, I also included a Model Checkpoint that will keep the best-performing model. The data was also partitioned into 80/20 for training and testing purposes. The models seen here are running very well. As seen visually, the predicted values are very closely aligned with the actual values. I believe that the LSTM models did a much more thorough job based on its iterative function.

Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ lstm (LSTM)                     │ (None, 50)             │        10,400 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout (Dropout)               │ (None, 50)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 8)              │           408 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 1)              │             9 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 10,817 (42.25 KB)
 Trainable params: 10,817 (42.25 KB)
 Non-trainable params: 0 (0.00 B)
 1/16 ━━━━━━━━━━━━━━━━━━━ 2s 135ms/step

12/16 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step  

16/16 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step

16/16 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step

Model: "sequential_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ lstm_1 (LSTM)                   │ (None, 50)             │        10,400 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_1 (Dropout)             │ (None, 50)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 8)              │           408 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_3 (Dense)                 │ (None, 1)              │             9 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 10,817 (42.25 KB)
 Trainable params: 10,817 (42.25 KB)
 Non-trainable params: 0 (0.00 B)
 1/16 ━━━━━━━━━━━━━━━━━━━ 1s 109ms/step

15/16 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step  

16/16 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step

16/16 ━━━━━━━━━━━━━━━━━━━━ 0s 11ms/step

Model: "sequential_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ lstm_2 (LSTM)                   │ (None, 50)             │        10,400 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dropout_2 (Dropout)             │ (None, 50)             │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_4 (Dense)                 │ (None, 8)              │           408 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_5 (Dense)                 │ (None, 1)              │             9 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 10,817 (42.25 KB)
 Trainable params: 10,817 (42.25 KB)
 Non-trainable params: 0 (0.00 B)
 1/16 ━━━━━━━━━━━━━━━━━━━ 1s 108ms/step

12/16 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step  

16/16 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step

16/16 ━━━━━━━━━━━━━━━━━━━━ 0s 12ms/step

Prophet Approach & Analysis

Finally, Prophet is the last model I am implementing. Prophet expects a data frame that consists of two columns: commonly known as “ds” (timestamps, or in other words, date) and “y” (target variable, in my case, daily closing prices). For the most part, my data is already in this format. The only thing I had to implement later was stripping the timezone. Unlike LSTM, this model is linear and is an additive model. Seasonality is a specific feature that this model anticipates. Our stock data is computed daily over around 9 years, so I select the yearly seasonality to be true and the weekly and daily to be false. Another feature of this model is the setting “freq = b”, which removes weekends and holidays. The performance of this model was interesting. Prophet’s predictive power seemed to be only potent for trends it notices. For example, in the graphs shown for SWPPX and SPY, I partitioned the data into 80/20. The location of the partitioned training/test set happens to be where the tail end of a decrease in stock price occurred. The model saw the decreasing trend and continued to predict that it would decrease. However, for MSFT, I changed the set to 90/10, in which some of the training data included the new upward trend past 2023. The MSFT model instead projected more upwards, indicating that the model relied more on short-term trends to predict values. In addition to this, Prophet includes components that can be plotted. I plotted the trend and yearly movements as well.

Discussion & Model Comparison

Stock Model MSE MAE NMSE NMAE
MSFT Prophet 4591.333 65.959 -12.971 11.021 0.158
MSFT ARIMA 13585.598 105.661 -2.431 37.437 0.291
MSFT LSTM 63.961 6.459 0.982 0.018 0.018
SWPPX Prophet 18.707 3.719 -4.200 1.693 0.337
SWPPX ARIMA 8.303 2.365 -1.056 0.742 0.211
SWPPX LSTM 0.033 0.132 0.992 0.008 0.012
SPY Prophet 21337.151 122.298 -4.163 45.381 0.260
SPY ARIMA 10072.653 84.816 -1.219 21.212 0.179
SPY LSTM 63.426 6.268 0.986 0.014 0.013

Conclusion & Limitations

Overall, LSTM was the best-performing model by far. Based on the trend of stock data it was receiving, it was able to accurately map temporal dependencies. LSTM, however, did take the longest to compute. This is something that should be considered. I did not specifically dive into the time used of each model, but LSTM was the longest by far. I think that there are certainly improvements that can be made. For example, with ARIMA or Prophet, perhaps testing month to month data over a single year may produce better results. I could have also used different stock data, and my models were limited to only three. For ARIMA, I specifically could have referenced or used SARIMA or AutoArima to test for the best parameters. I did implement the P, Q, and D test myself, and could have gone amiss there. I only referenced univariate analysis as well. I examined the closing price and nothing else. While this is most likely a very big factor, there could be a myriad of other factors at play as well. Perhaps a multivariate analysis that incorporates more features would yield more interesting results.